Skip to content

feat(ibis): Add Athena connector #1207

Merged
goldmedal merged 9 commits intoCanner:mainfrom
douenergy:AWS-Athena
Jun 4, 2025
Merged

feat(ibis): Add Athena connector #1207
goldmedal merged 9 commits intoCanner:mainfrom
douenergy:AWS-Athena

Conversation

@douenergy
Copy link
Copy Markdown
Contributor

@douenergy douenergy commented Jun 2, 2025

Description

  • Introduce AWS Athena datasource connector (Trino dialet).
  • The usage is the same as other datasource connector.

Summary by CodeRabbit

  • New Features

    • Added support for AWS Athena as a new data source, including connection configuration, metadata browsing, and querying capabilities.
    • Users can now query, validate, and retrieve metadata from Athena databases via the API.
  • Tests

    • Introduced comprehensive automated tests for Athena connector endpoints, covering query execution, validation, metadata retrieval, function execution, and error handling.
    • Added test configurations and fixtures to streamline Athena test setup and execution.
  • Chores

    • Updated dependencies to include Athena support and marked Athena-specific tests for easier management.
    • Enhanced CI workflow to run Athena-related tests when labeled accordingly.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Jun 2, 2025

Walkthrough

The changes introduce full support for AWS Athena as a data source in the application. This includes new model classes, metadata handling, connection logic, and comprehensive tests. The CI workflow and project configuration are updated to enable Athena-specific testing and dependencies, ensuring Athena integration is properly validated and maintained.

Changes

File(s) Change Summary
.github/workflows/ibis-ci.yml Added a conditional CI step to run Athena tests when the PR is labeled "athena", setting environment variables and executing Athena-specific tests.
ibis-server/app/model/init.py Added AthenaConnectionInfo and QueryAthenaDTO classes for Athena connection modeling; updated ConnectionInfo union to include Athena.
ibis-server/app/model/data_source.py Added Athena support to DataSource and DataSourceExtension enums; implemented static method to create Athena connections using Ibis; imported Athena DTOs.
ibis-server/app/model/metadata/athena.py Introduced AthenaMetadata class for Athena-specific metadata retrieval, including methods for table listing, constraints (returns empty), and version info. Added internal helpers for data type mapping and table name formatting.
ibis-server/app/model/metadata/factory.py Registered AthenaMetadata in the metadata factory mapping for Athena data sources.
ibis-server/pyproject.toml Added "athena" extra to ibis-framework dependency and introduced "athena" pytest marker for Athena-related tests.
ibis-server/tests/routers/v2/connector/test_athena.py Added comprehensive async pytest-based tests for Athena connector API endpoints, covering query execution, validation, metadata retrieval, and error handling.
ibis-server/tests/routers/v3/connector/athena/conftest.py Added pytest fixtures for Athena connection info and remote function list path setup; automatically mark tests in this directory with "athena".
ibis-server/tests/routers/v3/connector/athena/test_functions.py Added tests for Athena connector functions endpoint, including function list retrieval and execution of scalar and aggregate SQL functions.
ibis-server/tests/routers/v3/connector/athena/test_query.py Added tests for Athena query endpoint covering successful queries, limit handling, validation errors, dry-run mode, and error scenarios.
wren-core-base/manifest-macro/src/lib.rs Extended DataSource enum with new variant Athena and serde alias "athena".
wren-core-base/src/mdl/manifest.rs Updated Display trait for DataSource enum to support formatting the Athena variant as "ATHENA".
ibis-server/app/model/connector.py Added use of contextlib.closing to ensure proper closing of cursors for MSSqlConnector and PostgreSQL type name retrieval.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant API
    participant AthenaMetadata
    participant Athena (AWS)
    participant S3

    Client->>API: Send query/metadata request with Athena connection info
    API->>AthenaMetadata: Instantiate with connection info
    AthenaMetadata->>Athena: Connect using provided AWS credentials & S3 staging dir
    AthenaMetadata->>Athena: Execute SQL (e.g., information_schema query)
    Athena-->>AthenaMetadata: Return results
    AthenaMetadata-->>API: Process and return structured response
    API-->>Client: Respond with query/metadata results
Loading

Possibly related PRs

Suggested labels

core, rust

Suggested reviewers

  • goldmedal

Poem

🐇 Hopping through clouds with data bright,
Athena joins the backend flight.
Secrets safe and queries fast,
Metadata fetched, connections cast.
Tests run smooth, CI’s delight,
The rabbit cheers with all its might! 🥕✨

✨ Finishing Touches
  • 📝 Generate Docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Explain this complex logic.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@github-actions github-actions bot added ibis dependencies Pull requests that update a dependency file python Pull requests that update Python code ci labels Jun 2, 2025
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (3)
ibis-server/app/model/__init__.py (1)

105-124: Consider removing redundant alias for schema_name field.

The AthenaConnectionInfo class is well-structured with appropriate AWS/Athena-specific fields and proper use of SecretStr for security. However, the schema_name field has a redundant alias that matches the field name.

Consider this minor optimization:

-    schema_name: SecretStr = Field(
-        alias="schema_name",
-        description="The database name in Athena",
-        examples=["default"],
-    )
+    schema_name: SecretStr = Field(
+        description="The database name in Athena",
+        examples=["default"],
+    )
🧰 Tools
🪛 Pylint (3.3.7)

[convention] 105-105: Missing class docstring

(C0115)


[refactor] 105-105: Too few public methods (1/2)

(R0903)

ibis-server/app/model/metadata/athena.py (2)

115-115: Update comment to reflect Athena instead of MySQL.

The comment references MySQL documentation which is incorrect for Athena.

-            # Numeric Types(https://dev.mysql.com/doc/refman/8.4/en/numeric-types.html)
+            # Numeric Types for Athena/Trino

131-131: Update comment to reflect Athena instead of MySQL.

The comment references MySQL documentation which is incorrect for Athena.

-            # Date and Time Types(https://dev.mysql.com/doc/refman/8.4/en/date-and-time-types.html)
+            # Date and Time Types for Athena/Trino
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between de73780 and a3f1a6e.

⛔ Files ignored due to path filters (1)
  • ibis-server/poetry.lock is excluded by !**/*.lock
📒 Files selected for processing (7)
  • .github/workflows/ibis-ci.yml (1 hunks)
  • ibis-server/app/model/__init__.py (3 hunks)
  • ibis-server/app/model/data_source.py (5 hunks)
  • ibis-server/app/model/metadata/athena.py (1 hunks)
  • ibis-server/app/model/metadata/factory.py (2 hunks)
  • ibis-server/pyproject.toml (2 hunks)
  • ibis-server/tests/routers/v2/connector/test_athena.py (1 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (3)
ibis-server/app/model/metadata/factory.py (2)
ibis-server/app/model/metadata/athena.py (1)
  • AthenaMetadata (17-139)
ibis-server/app/model/data_source.py (1)
  • DataSource (44-70)
ibis-server/app/model/data_source.py (1)
ibis-server/app/model/__init__.py (2)
  • AthenaConnectionInfo (105-123)
  • QueryAthenaDTO (39-40)
ibis-server/tests/routers/v2/connector/test_athena.py (2)
wren-core-base/manifest-macro/src/lib.rs (1)
  • manifest (26-56)
ibis-server/tests/conftest.py (1)
  • client (18-23)
🪛 Pylint (3.3.7)
ibis-server/app/model/metadata/athena.py

[convention] 45-45: Line too long (103/100)

(C0301)


[convention] 1-1: Missing module docstring

(C0114)


[error] 3-3: Unable to import 'pandas'

(E0401)


[convention] 17-17: Missing class docstring

(C0115)

ibis-server/app/model/__init__.py

[convention] 39-39: Missing class docstring

(C0115)


[refactor] 39-39: Too few public methods (0/2)

(R0903)


[convention] 105-105: Missing class docstring

(C0115)


[refactor] 105-105: Too few public methods (1/2)

(R0903)

ibis-server/app/model/data_source.py

[convention] 45-45: Class constant name "athena" doesn't conform to UPPER_CASE naming style

(C0103)


[convention] 74-74: Class constant name "athena" doesn't conform to UPPER_CASE naming style

(C0103)


[convention] 105-105: Missing function or method docstring

(C0116)

ibis-server/tests/routers/v2/connector/test_athena.py

[convention] 43-43: Line too long (105/100)

(C0301)


[warning] 107-107: ## fixme this should be float64

(W0511)


[convention] 1-1: Missing module docstring

(C0114)


[error] 4-4: Unable to import 'orjson'

(E0401)


[convention] 11-11: Constant name "base_url" doesn't conform to UPPER_CASE naming style

(C0103)


[convention] 73-73: Missing function or method docstring

(C0116)


[convention] 77-77: Missing function or method docstring

(C0116)


[warning] 77-77: Redefining name 'manifest_str' from outer scope (line 73)

(W0621)


[convention] 117-117: Missing function or method docstring

(C0116)


[warning] 117-117: Redefining name 'manifest_str' from outer scope (line 73)

(W0621)


[convention] 145-145: Missing function or method docstring

(C0116)


[convention] 161-161: Missing function or method docstring

(C0116)


[warning] 161-161: Redefining name 'manifest_str' from outer scope (line 73)

(W0621)


[convention] 174-174: Missing function or method docstring

(C0116)


[warning] 174-174: Redefining name 'manifest_str' from outer scope (line 73)

(W0621)


[convention] 190-190: Missing function or method docstring

(C0116)


[warning] 190-190: Redefining name 'manifest_str' from outer scope (line 73)

(W0621)


[convention] 203-203: Missing function or method docstring

(C0116)


[warning] 203-203: Redefining name 'manifest_str' from outer scope (line 73)

(W0621)


[convention] 217-217: Missing function or method docstring

(C0116)


[warning] 217-217: Redefining name 'manifest_str' from outer scope (line 73)

(W0621)


[convention] 232-232: Missing function or method docstring

(C0116)


[warning] 232-232: Redefining name 'manifest_str' from outer scope (line 73)

(W0621)


[convention] 244-244: Missing function or method docstring

(C0116)


[warning] 245-245: Redefining name 'manifest_str' from outer scope (line 73)

(W0621)


[convention] 268-268: Missing function or method docstring

(C0116)


[warning] 268-268: Redefining name 'manifest_str' from outer scope (line 73)

(W0621)


[convention] 281-281: Missing function or method docstring

(C0116)


[warning] 282-282: Redefining name 'manifest_str' from outer scope (line 73)

(W0621)


[convention] 307-307: Missing function or method docstring

(C0116)


[convention] 331-331: Missing function or method docstring

(C0116)


[convention] 343-343: Missing function or method docstring

(C0116)

⏰ Context from checks skipped due to timeout of 90000ms (1)
  • GitHub Check: ci
🔇 Additional comments (19)
ibis-server/app/model/metadata/factory.py (2)

2-2: LGTM: Clean import following established pattern.

The import of AthenaMetadata follows the same pattern as other metadata imports and maintains alphabetical ordering.


21-21: LGTM: Consistent factory mapping implementation.

The mapping entry correctly associates DataSource.athena with AthenaMetadata, following the established factory pattern used by other data sources.

ibis-server/pyproject.toml (2)

14-14: LGTM: Correct dependency configuration for Athena support.

Adding "athena" to the ibis-framework extras properly enables the Athena backend support, following the same pattern as other database backends.


70-70: LGTM: Consistent pytest marker addition.

The athena test marker follows the established pattern and enables selective test execution for Athena-specific tests.

.github/workflows/ibis-ci.yml (1)

93-100: LGTM: Well-structured conditional Athena testing step.

The implementation correctly follows the established pattern used by BigQuery and Snowflake conditional tests. The environment variables appropriately map to AWS credentials and Athena-specific configuration needed for testing.

ibis-server/app/model/__init__.py (2)

39-41: LGTM: Consistent DTO implementation.

The QueryAthenaDTO class correctly follows the established pattern used by other database-specific query DTOs, properly typing the connection_info field.

🧰 Tools
🪛 Pylint (3.3.7)

[convention] 39-39: Missing class docstring

(C0115)


[refactor] 39-39: Too few public methods (0/2)

(R0903)


367-368: LGTM: Proper union type extension.

The addition of AthenaConnectionInfo to the ConnectionInfo union type is correctly placed and maintains consistency with the existing type union structure.

ibis-server/app/model/data_source.py (3)

14-14: LGTM! Imports are correctly added.

The imports for AthenaConnectionInfo and QueryAthenaDTO are properly included and follow the existing import pattern.

Also applies to: 23-23


45-45: LGTM! Enum additions follow the established pattern.

The athena enum members are correctly added to both DataSource and DataSourceExtension enums, maintaining consistency with other data sources.

Also applies to: 74-74

🧰 Tools
🪛 Pylint (3.3.7)

[convention] 45-45: Class constant name "athena" doesn't conform to UPPER_CASE naming style

(C0103)


104-112: LGTM! Connection method implementation is correct.

The get_athena_connection method properly:

  • Follows the established pattern for static connection methods
  • Correctly extracts secret values from the connection info
  • Uses the appropriate ibis.athena.connect parameters

The implementation is consistent with other connector methods in the file.

🧰 Tools
🪛 Pylint (3.3.7)

[convention] 105-105: Missing function or method docstring

(C0116)

ibis-server/app/model/metadata/athena.py (3)

17-21: LGTM! Class structure follows the established pattern.

The AthenaMetadata class properly inherits from Metadata and initializes the connection using the data source management system.

🧰 Tools
🪛 Pylint (3.3.7)

[convention] 17-17: Missing class docstring

(C0115)


93-95: LGTM! Constraint handling is correctly implemented.

Properly returns an empty list since Athena doesn't support foreign key constraints, with clear documentation.


97-98: LGTM! Version method provides appropriate information.

Returns a descriptive version string that accurately reflects AWS Athena's versioning approach.

ibis-server/tests/routers/v2/connector/test_athena.py (6)

9-19: LGTM! Test configuration is well-structured.

The test setup properly:

  • Uses pytest markers for test categorization
  • Configures environment-based connection info with sensible defaults
  • Follows established patterns for test configuration
🧰 Tools
🪛 Pylint (3.3.7)

[convention] 11-11: Constant name "base_url" doesn't conform to UPPER_CASE naming style

(C0103)


20-69: LGTM! Comprehensive test manifest structure.

The manifest provides excellent test coverage with:

  • Various column types (integer, varchar, float, date, timestamp, bytea)
  • Complex expressions and type casting
  • Edge cases like NULL values
  • Realistic test data structure
🧰 Tools
🪛 Pylint (3.3.7)

[convention] 43-43: Line too long (105/100)

(C0301)


77-102: LGTM! Thorough query result validation.

The test properly validates:

  • Response structure and status code
  • Data consistency with expected results
  • Column count matches manifest
  • Data types are correctly returned

The assertion structure is comprehensive and reliable.

🧰 Tools
🪛 Pylint (3.3.7)

[convention] 77-77: Missing function or method docstring

(C0116)


[warning] 77-77: Redefining name 'manifest_str' from outer scope (line 73)

(W0621)


117-143: LGTM! Limit parameter testing is well-implemented.

The test effectively validates that the limit parameter works correctly in both scenarios:

  • When applied via query parameter
  • When SQL already contains a LIMIT clause (parameter takes precedence)
🧰 Tools
🪛 Pylint (3.3.7)

[convention] 117-117: Missing function or method docstring

(C0116)


[warning] 117-117: Redefining name 'manifest_str' from outer scope (line 73)

(W0621)


145-305: LGTM! Comprehensive error scenario testing.

Excellent coverage of error cases:

  • Missing required fields (manifestStr, sql, connectionInfo)
  • Invalid parameters for validation rules
  • Proper HTTP status codes and error messages

This ensures robust error handling in the API.

🧰 Tools
🪛 Pylint (3.3.7)

[convention] 145-145: Missing function or method docstring

(C0116)


[convention] 161-161: Missing function or method docstring

(C0116)


[warning] 161-161: Redefining name 'manifest_str' from outer scope (line 73)

(W0621)


[convention] 174-174: Missing function or method docstring

(C0116)


[warning] 174-174: Redefining name 'manifest_str' from outer scope (line 73)

(W0621)


[convention] 190-190: Missing function or method docstring

(C0116)


[warning] 190-190: Redefining name 'manifest_str' from outer scope (line 73)

(W0621)


[convention] 203-203: Missing function or method docstring

(C0116)


[warning] 203-203: Redefining name 'manifest_str' from outer scope (line 73)

(W0621)


[convention] 217-217: Missing function or method docstring

(C0116)


[warning] 217-217: Redefining name 'manifest_str' from outer scope (line 73)

(W0621)


[convention] 232-232: Missing function or method docstring

(C0116)


[warning] 232-232: Redefining name 'manifest_str' from outer scope (line 73)

(W0621)


[convention] 244-244: Missing function or method docstring

(C0116)


[warning] 245-245: Redefining name 'manifest_str' from outer scope (line 73)

(W0621)


[convention] 268-268: Missing function or method docstring

(C0116)


[warning] 268-268: Redefining name 'manifest_str' from outer scope (line 73)

(W0621)


[convention] 281-281: Missing function or method docstring

(C0116)


[warning] 282-282: Redefining name 'manifest_str' from outer scope (line 73)

(W0621)


307-351: LGTM! Metadata endpoint testing is comprehensive.

The tests properly validate:

  • Table listing functionality
  • Constraint handling (correctly expects empty list for Athena)
  • Version information retrieval
  • Response structure validation

Good coverage of all metadata endpoints.

🧰 Tools
🪛 Pylint (3.3.7)

[convention] 307-307: Missing function or method docstring

(C0116)


[convention] 331-331: Missing function or method docstring

(C0116)


[convention] 343-343: Missing function or method docstring

(C0116)

Copy link
Copy Markdown
Contributor

@goldmedal goldmedal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, @douenergy. Overall, it looks good to me. I left some minor comments. It's better to add the test case for the glue table.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
ibis-server/app/model/metadata/athena.py (1)

25-43: SQL injection vulnerability still exists despite previous discussion.

The direct string interpolation of schema_name remains a security concern. Since the previous parameterized approach didn't work with Ibis raw_sql, implement proper identifier validation and quoting as discussed in the previous review.

Apply the validation and quoting approach:

+    def _validate_and_quote_identifier(self, identifier: str) -> str:
+        """Validate and quote SQL identifier to prevent injection."""
+        import re
+        
+        # Remove any existing quotes and validate
+        clean_identifier = identifier.strip().strip('"').strip("'")
+        
+        # Only allow alphanumeric characters, underscores, and hyphens
+        if not re.match(r'^[a-zA-Z0-9_-]+$', clean_identifier):
+            raise ValueError(f"Invalid schema name: {clean_identifier}")
+        
+        # Quote the identifier using double quotes (standard SQL)
+        return f'"{clean_identifier}"'

     def get_table_list(self) -> list[Table]:
         schema_name = self.connection_info.schema_name.get_secret_value()
+        quoted_schema = self._validate_and_quote_identifier(schema_name)

         sql = f"""
             SELECT 
                 t.table_catalog,
                 t.table_schema,
                 t.table_name,
                 c.column_name,
                 c.ordinal_position,
                 c.is_nullable,
                 c.data_type
             FROM 
                 information_schema.tables AS t 
             JOIN 
                 information_schema.columns AS c 
                 ON t.table_catalog = c.table_catalog
                 AND t.table_schema = c.table_schema
                 AND t.table_name = c.table_name
-            WHERE t.table_schema = '{schema_name}'
+            WHERE t.table_schema = {quoted_schema}
             ORDER BY t.table_name
             """
🧹 Nitpick comments (4)
ibis-server/app/model/metadata/athena.py (3)

1-14: Add module docstring for better documentation.

Consider adding a module-level docstring to describe the purpose of this Athena metadata implementation.

+"""AWS Athena metadata implementation for retrieving tables, columns, and constraints."""
 import re

 import pandas as pd
🧰 Tools
🪛 Pylint (3.3.7)

[convention] 1-1: Missing module docstring

(C0114)


[error] 3-3: Unable to import 'pandas'

(E0401)


17-21: Add class docstring for better documentation.

The class would benefit from a docstring explaining its purpose and usage.

 class AthenaMetadata(Metadata):
+    """Metadata implementation for AWS Athena data source.
+    
+    Provides functionality to retrieve table definitions, column information,
+    and constraints from Athena's information_schema.
+    """
     def __init__(self, connection_info: AthenaConnectionInfo):
🧰 Tools
🪛 Pylint (3.3.7)

[convention] 17-17: Missing class docstring

(C0115)


103-137: Consider improving type mapping coverage and organization.

The type mapping includes some MySQL-specific types that may not be relevant for Athena. Consider organizing by category and ensuring Athena-specific types are properly covered.

     def _transform_column_type(self, data_type):
         data_type = re.sub(r"\(.*\)", "", data_type).strip()
         switcher = {
-            # String Types (ignore Binary and Spatial Types for now)
+            # String Types
             "char": RustWrenEngineColumnType.CHAR,
             "varchar": RustWrenEngineColumnType.VARCHAR,
-            "tinytext": RustWrenEngineColumnType.TEXT,
             "text": RustWrenEngineColumnType.TEXT,
-            "mediumtext": RustWrenEngineColumnType.TEXT,
-            "longtext": RustWrenEngineColumnType.TEXT,
-            "enum": RustWrenEngineColumnType.VARCHAR,
-            "set": RustWrenEngineColumnType.VARCHAR,
-            "bit": RustWrenEngineColumnType.TINYINT,
+            "string": RustWrenEngineColumnType.VARCHAR,  # Athena string type
+            
+            # Numeric Types
             "tinyint": RustWrenEngineColumnType.TINYINT,
             "smallint": RustWrenEngineColumnType.SMALLINT,
-            "mediumint": RustWrenEngineColumnType.INTEGER,
             "int": RustWrenEngineColumnType.INTEGER,
             "integer": RustWrenEngineColumnType.INTEGER,
             "bigint": RustWrenEngineColumnType.BIGINT,
-            # boolean
+            
+            # Boolean
             "bool": RustWrenEngineColumnType.BOOL,
             "boolean": RustWrenEngineColumnType.BOOL,
-            # Decimal
+            
+            # Decimal/Float
             "float": RustWrenEngineColumnType.FLOAT4,
             "double": RustWrenEngineColumnType.DOUBLE,
             "decimal": RustWrenEngineColumnType.DECIMAL,
             "numeric": RustWrenEngineColumnType.NUMERIC,
+            
+            # Date/Time
             "date": RustWrenEngineColumnType.DATE,
             "datetime": RustWrenEngineColumnType.TIMESTAMP,
             "timestamp": RustWrenEngineColumnType.TIMESTAMPTZ,
-            # JSON Type
+            
+            # JSON
             "json": RustWrenEngineColumnType.JSON,
+            
+            # Binary
+            "binary": RustWrenEngineColumnType.BYTEA,
+            "varbinary": RustWrenEngineColumnType.BYTEA,
         }
ibis-server/tests/routers/v2/connector/test_athena.py (1)

1-9: Add module docstring to describe test purpose.

Consider adding a module-level docstring to describe what this test module covers.

+"""Tests for Athena connector v2 API endpoints.
+
+Tests cover query execution, validation rules, and metadata retrieval
+for both standard Athena and Glue-based databases.
+"""
 import base64
 import os
🧰 Tools
🪛 Pylint (3.3.7)

[convention] 1-1: Missing module docstring

(C0114)


[error] 4-4: Unable to import 'orjson'

(E0401)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7dd1b5b and b47eaba.

📒 Files selected for processing (2)
  • ibis-server/app/model/metadata/athena.py (1 hunks)
  • ibis-server/tests/routers/v2/connector/test_athena.py (1 hunks)
🧰 Additional context used
🪛 Pylint (3.3.7)
ibis-server/app/model/metadata/athena.py

[convention] 45-45: Line too long (103/100)

(C0301)


[convention] 1-1: Missing module docstring

(C0114)


[error] 3-3: Unable to import 'pandas'

(E0401)


[convention] 17-17: Missing class docstring

(C0115)

ibis-server/tests/routers/v2/connector/test_athena.py

[convention] 54-54: Line too long (105/100)

(C0301)


[warning] 118-118: ## fixme this should be float64

(W0511)


[warning] 158-158: ## fixme this should be float64

(W0511)


[convention] 1-1: Missing module docstring

(C0114)


[error] 4-4: Unable to import 'orjson'

(E0401)


[convention] 11-11: Constant name "base_url" doesn't conform to UPPER_CASE naming style

(C0103)


[convention] 84-84: Missing function or method docstring

(C0116)


[convention] 88-88: Missing function or method docstring

(C0116)


[warning] 88-88: Redefining name 'manifest_str' from outer scope (line 84)

(W0621)


[convention] 128-128: Missing function or method docstring

(C0116)


[warning] 128-128: Redefining name 'manifest_str' from outer scope (line 84)

(W0621)


[convention] 168-168: Missing function or method docstring

(C0116)


[warning] 168-168: Redefining name 'manifest_str' from outer scope (line 84)

(W0621)


[convention] 196-196: Missing function or method docstring

(C0116)


[convention] 212-212: Missing function or method docstring

(C0116)


[warning] 212-212: Redefining name 'manifest_str' from outer scope (line 84)

(W0621)


[convention] 225-225: Missing function or method docstring

(C0116)


[warning] 225-225: Redefining name 'manifest_str' from outer scope (line 84)

(W0621)


[convention] 241-241: Missing function or method docstring

(C0116)


[warning] 241-241: Redefining name 'manifest_str' from outer scope (line 84)

(W0621)


[convention] 254-254: Missing function or method docstring

(C0116)


[warning] 254-254: Redefining name 'manifest_str' from outer scope (line 84)

(W0621)


[convention] 268-268: Missing function or method docstring

(C0116)


[warning] 268-268: Redefining name 'manifest_str' from outer scope (line 84)

(W0621)


[convention] 283-283: Missing function or method docstring

(C0116)


[warning] 283-283: Redefining name 'manifest_str' from outer scope (line 84)

(W0621)


[convention] 295-295: Missing function or method docstring

(C0116)


[warning] 296-296: Redefining name 'manifest_str' from outer scope (line 84)

(W0621)


[convention] 319-319: Missing function or method docstring

(C0116)


[warning] 319-319: Redefining name 'manifest_str' from outer scope (line 84)

(W0621)


[convention] 332-332: Missing function or method docstring

(C0116)


[warning] 333-333: Redefining name 'manifest_str' from outer scope (line 84)

(W0621)


[convention] 358-358: Missing function or method docstring

(C0116)


[convention] 382-382: Missing function or method docstring

(C0116)


[convention] 394-394: Missing function or method docstring

(C0116)

⏰ Context from checks skipped due to timeout of 90000ms (1)
  • GitHub Check: ci
🔇 Additional comments (4)
ibis-server/tests/routers/v2/connector/test_athena.py (4)

88-126: Comprehensive query test with good data validation.

The test effectively validates query execution, column count, data content, and data types. The assertions are thorough and check both structure and content.

🧰 Tools
🪛 Pylint (3.3.7)

[warning] 118-118: ## fixme this should be float64

(W0511)


[convention] 88-88: Missing function or method docstring

(C0116)


[warning] 88-88: Redefining name 'manifest_str' from outer scope (line 84)

(W0621)


128-166: Excellent addition of Glue database testing.

Testing with Glue-based databases ensures the connector works with different Athena deployment patterns. The test structure mirrors the standard test appropriately.

🧰 Tools
🪛 Pylint (3.3.7)

[warning] 158-158: ## fixme this should be float64

(W0511)


[convention] 128-128: Missing function or method docstring

(C0116)


[warning] 128-128: Redefining name 'manifest_str' from outer scope (line 84)

(W0621)


358-380: Robust metadata table listing test.

The test properly validates the metadata response structure and includes conditional logic to check for the test table's existence, which makes it resilient to different test environments.

🧰 Tools
🪛 Pylint (3.3.7)

[convention] 358-358: Missing function or method docstring

(C0116)


382-392: Correct validation of Athena constraints limitation.

The test correctly validates that Athena returns an empty constraints list, which aligns with Athena's lack of foreign key constraint support.

🧰 Tools
🪛 Pylint (3.3.7)

[convention] 382-382: Missing function or method docstring

(C0116)

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
ibis-server/app/model/metadata/athena.py (1)

26-44: The SQL injection vulnerability remains unaddressed.

Based on the previous discussion, parameterized queries don't work with Ibis raw_sql. The validation and quoting approach suggested in the follow-up was not implemented.

Consider implementing the validation approach:

def _validate_and_quote_identifier(self, identifier: str) -> str:
    """Validate and quote SQL identifier to prevent injection."""
    import re
    clean_identifier = identifier.strip().strip('"').strip("'")
    
    if not re.match(r'^[a-zA-Z0-9_-]+$', clean_identifier):
        raise ValueError(f"Invalid schema name: {clean_identifier}")
    
    return f'"{clean_identifier}"'

Then use: WHERE t.table_schema = {self._validate_and_quote_identifier(schema_name)}

🧹 Nitpick comments (3)
ibis-server/app/model/metadata/athena.py (3)

1-16: Add module and class documentation.

The static analysis correctly identifies missing docstrings. Add proper documentation to improve code maintainability.

+"""
+AWS Athena metadata implementation for retrieving schema information.
+
+This module provides AthenaMetadata class that connects to AWS Athena
+and retrieves table and column metadata using information_schema queries.
+"""
 import re
 from contextlib import closing

 import pandas as pd
🧰 Tools
🪛 Pylint (3.3.7)

[convention] 1-1: Missing module docstring

(C0114)


[error] 4-4: Unable to import 'pandas'

(E0401)


18-22: Add class documentation.

 class AthenaMetadata(Metadata):
+    """
+    Metadata implementation for AWS Athena data source.
+    
+    Provides methods to retrieve table schemas, columns, and constraints
+    from Athena using information_schema queries.
+    """
     def __init__(self, connection_info: AthenaConnectionInfo):
🧰 Tools
🪛 Pylint (3.3.7)

[convention] 18-18: Missing class docstring

(C0115)


46-47: Fix line length formatting issue.

-        # We need to use raw_sql here because using the sql method causes Ibis to *create  view* first,
-        # which does not work with information_schema queries.
+        # We need to use raw_sql here because using the sql method causes Ibis to
+        # *create view* first, which does not work with information_schema queries.
🧰 Tools
🪛 Pylint (3.3.7)

[convention] 46-46: Line too long (103/100)

(C0301)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b47eaba and 02f1f72.

📒 Files selected for processing (2)
  • ibis-server/app/model/connector.py (3 hunks)
  • ibis-server/app/model/metadata/athena.py (1 hunks)
✅ Files skipped from review due to trivial changes (1)
  • ibis-server/app/model/connector.py
🧰 Additional context used
🪛 Pylint (3.3.7)
ibis-server/app/model/metadata/athena.py

[convention] 46-46: Line too long (103/100)

(C0301)


[convention] 1-1: Missing module docstring

(C0114)


[error] 4-4: Unable to import 'pandas'

(E0401)


[convention] 18-18: Missing class docstring

(C0115)

⏰ Context from checks skipped due to timeout of 90000ms (1)
  • GitHub Check: ci
🔇 Additional comments (1)
ibis-server/app/model/metadata/athena.py (1)

48-51: Good implementation of cursor management.

The use of closing() context manager properly addresses the resource management concern from previous reviews.

Copy link
Copy Markdown
Contributor

@goldmedal goldmedal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @douenergy. 👍

@goldmedal goldmedal merged commit 0803840 into Canner:main Jun 4, 2025
3 of 4 checks passed
@coderabbitai coderabbitai bot mentioned this pull request Oct 3, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

athena ci dependencies Pull requests that update a dependency file ibis python Pull requests that update Python code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants